0.0 Brief Introduction of the Project
0.1 Project Recap
0.2 Project Difficulties
1.0 Story of President Trump and Obama
1.1 What are the first impressions on Trump and Obama on Twitter?
1.2 What are their attitudes?
1.3 What are their styles?
1.4 What are their connections?
1.5 How is Trump's followers' behavior?
2.0 Methods We Utilize for Analysis and Visualization
How we overcome the project diffculties
2.1 Scraping
2.2 Data Cleaning
2.3 Sentiment Analyzer
2.4 Topic Modelling
2.41 Wordcloud Masking
2.42 Topic Modelling Graph
2.43 Network Graph Analysis
3.0 Conlusion
What do we do in this project?
Data of the tweets sent by President Obama and Trump from 2012 to 2019 have been scraped and analyzed. We intentionally select the same period because we attempt to present a vivid profile of Trump and Obama, containing not merely a president image during his tenure but also a guy without his crown after or before his presidency.
We have scraped the data from twitter, and then cleaned, analyzed and visualize the data. In the analysis part, we have conducted sentimental analysis, topic modeling, and in the presentation part, we have utilized the diversified method to visualize the result.
How do we acquire datasets?
We used twint to scrape the tweets from twitter. It is an advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations. We collected all the information related to the tweets of Obama and Trump.
Trump Frequently used words:
Great, People, Thank, Country
Obama Frequently used words:
President, American, Health, Job
Implications:
According to a survey, it is said that most men say 7,000 words a day. Are Trump and Obama telling the truth? It is hard to say. But can we get a glimpse of who they are like from the words they talk about most? Of course yes!
Look at him! Why did Trump pout? Maybe he likes to pout when he is thinking about how to build a great America again.
What about Obama? He is kind of a practical guy because he likes to talk about livelihood issues and tries his utomost to advertise his healthcare planning.
Angway, they are cute in a different way!
Popular topics among Trump’s tweets:
Implications:
Hahaaa! See, my instinct for Trump is so correct! Trump cares about election, election but election!
Considering the relevant terms, we can see Trump aspires to "make America great again". And also he mentions "thank" a lot. I can imagine that the presidential election journey must be full of sweets and bitters and he must appreciate everyone who helps him win the election.
Popular topics among Obama’s tweets :
Obama loves to talk about violence most.
For the relevant terms, we can see that "congress", "tax", "gun" have been mentioned a lot.
Positive Words Trump has tweeted positively about America, white house, vote while Obama tweeted positively about law, healthcare, immigration, economy.
Negative Words. Trump has tweeted negatively about Hillary Clinton, fake, Russia, China while Obama has tweeted negatively about violence.
Implications:
Trump constantly draws a clear distinction between love and hate in his hearts.
He loves America and white house. But he also likes to say bad words about Hillary Clinton. How naughty he is! Although China has shown robust economic growth nowadays, it seems that Trump does not praise it too much. Furthermore, of course, in Trump's dislike list, we cannot forget FAKE NEWS!
On the other hand, Obama is a peace-and-love guy who holds a positive attitude on the law, healthcare, immigration, and economy but hates violence.
Specific words choosed to analyze Trump and Obama: "China", "Amercia", "fake", "immigrant", "job":
Four key words have been choosed to dig more about the content of the tweets. Both Trump and Obama love to talk about America. Just like Abraham Lincoln said, "of the people, by the people, for the people", they both think and talk about Amrica and American's lives day and night.
From Obama's presdiency to Trump's, we can see that the president talks about China more. As a Chinses, I am proud that China plays a role at the international stage and hopefully China and Amercia can cooperate more in the future. (Well, a little bit off the topic.) Let's get back to the main path, we can see that job have been talked a lot by both Trump and Obama. It is not an easy task to solve job problems!
Tweets haven been categorized according to urls, photos, mentions, and text only.
Trump is kind of a social guy who loves to mention people.
However, Obama does not want to involve too many people on twitter but loves to use urls. It might be a great strategy to use other site to deliver a large amount of infomation, isn't it?
As we can see above,
Trump loves to tweet whenever he is a present or not. Very surprisingly, before 2017 when he was not a president, he tweeted a lot more than he did after he becomes a president. I think it is perhaps because his public relation team asked him to be more aware of his words. Who knows?
However, Obama stayed active during his presidency and he enjoys his retire life after that. He barely tweets any more after his presidenct. Hey, Obama, we miss you so much on Twitter!
It seems that the social media platform is created for a person like Trump.
It is fairly to say that Trump cannot live a life without twitter. He tweets on the early morning, and also at night. I even can imagine a picture that Trump uses one hand holding his iPhone and writing twitter and another hand holding his wife saying I am almost done in bed.
However, Obama is like a good student who is self-disciplined as always and never spent unnecessary time on twitter. He never twitters at late night or on the early morning. After work, he might spend a lot of time with his family and be a good husband, a good father.
Surprisingly, among all the people who were mentioned by the two presidents, both Trump and Obama maintain a close relationship with nytimes. It seems that New York Times shares the same opinions with the two.
Note: We select five keywords of tweets, "Immigrant", "Fake", "China", "American", "Job", and analyze the followers' reactions towards them. Because the number and content of tweets sent by the two presidents are different, we choose to present Trump only here instead of comparing the two to make this analysis reasonable and simple to understand altough the codes are written for both in the other file.
From the graph above, we can see that
Netizens love to read Trump's tweets about America, and fake news (Yes, people love to watch the fight war between the president and CNN) and then the job.
Among the three indexes aforementioned, the result typically is similar to the result mentioned before, which means the followers are active ones and trump does not buy fake followers to advertize his twitters.
In terms of the actions by the followers, we can see that most of the people prefer to click the "like" bottom and do not like to type and write comments. Even you are a president, we just do not want to talk to you! :)
Looking at the left above nine graphs, we can see the positive relations among the number of likes, retweets and replies.
Looking at the graphs located in the 3rd column, 1st to 3rd rows, we can see that the numbers of likes and replies are both more than that of retweets, which means that some of the followers love Trump in a clandestine way and they do not want other people to get notice that they are Trump’s followers and speak for Trump..
Considering the graph located in the 6th to 7th column and the 1st row, we can see that the number of people who replies to the topics containing "China" is less than the people who does not and in terms of the topics containing “America”, the situation is the other way around.
import twint
print ("Fetching Tweets")
c = twint.Config()
# choose username (optional)
c.Username = "realDonaldTrump"
# choose search term (optional)
# c.Search = "insert search term here"
c.Search = "immigration OR emigration OR employment OR migration OR citizenship OR law OR immigrants OR in-migration OR deport OR deportation OR policy OR labour OR labor Or H1B OR visa OR PR OR work permit"
# no idea, but makes the csv format properly
c.Store_csv = True
# change the name of the csv file
c.Output = "tweets_test.csv"
twint.run.Search(c)
def standardize_report(pid, report='tweet'):
# Print player information
plyr = df.loc[pid]
print(plyr['conversation_id'], plyr['created_at'], plyr['date'], plyr['time'], plyr['timezone'], plyr['user_id'], '\n')
# Extract report from player data frame
s = df.at[pid, report]
# Convert text to lower case
s = s.lower()
# Remove unnecessary punctuation
s = re.sub('[,()]', '', s)
# Regular expression tokenization - Method 1
words = regexp_tokenize(s, '[^. ]+')
# Filter stop words from tokenized words
words = [word for word in words if word not in stopwords.words('english') if word != "he's"]
# Lemmatizataion
lmtzr = WordNetLemmatizer()
words = [lmtzr.lemmatize(word) for word in words]
return words
def remove_links(tweet):
'''Takes a string and removes web links from it'''
tweet = re.sub(r'http\S+', '', tweet) # remove http links
tweet = re.sub(r'bit.ly/\S+', '', tweet) # rempve bitly links
tweet = tweet.strip('[link]') # remove [links]
return tweet
def remove_users(tweet):
'''Takes a string and removes retweet and @user information'''
tweet = re.sub('(RT\s@[A-Za-z]+[A-Za-z0-9-_]+)', '', tweet) # remove retweet
tweet = re.sub('(@[A-Za-z]+[A-Za-z0-9-_]+)', '', tweet) # remove tweeted at
return tweet
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
# function to print sentiments
# of the sentence.
def sentiment_scores(sentence):
# Create a SentimentIntensityAnalyzer object.
sid_obj = SentimentIntensityAnalyzer()
# polarity_scores method of SentimentIntensityAnalyzer
# oject gives a sentiment dictionary.
# which contains pos, neg, neu, and compound scores.
sentiment_dict = sid_obj.polarity_scores(sentence)
return sentiment_dict
We made use of an inbuilt library for performing sentiment analysis on the tweets that we fetched. It meaures four characteristics:
First three scores are used to calculate the compound score, and compound score is used to decide the sentiment of sentence.
Using these scores, we created the word cloud for each president and classified them on the basis of sentiment of the tweets.
from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import os
from wordcloud import WordCloud, STOPWORDS
# get data directory (using getcwd() is needed to support running example in generated IPython notebook)
d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()
# Read the whole text.
text = str(trump_words)
# read the mask image
# taken from
# http://www.stencilry.org/stencils/movies/alice%20in%20wonderland/255fk.jpg
alice_mask = np.array(Image.open(path.join(d, "trump3.jfif")))
stopwords = set(STOPWORDS)
stopwords.add("said")
wc = WordCloud(background_color="white", max_words=2000, mask=alice_mask,
stopwords=stopwords, contour_width=3, contour_color='steelblue')
# generate word cloud
wc.generate(text)
# store to file
wc.to_file(path.join(d, "trump_bnw_wordcloud.png"))
# show
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.figure()
plt.imshow(alice_mask, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis("off")
plt.show()
import pyLDAvis.gensim
pyLDAvis.enable_notebook()
panel = pyLDAvis.gensim.prepare(lda, corpus_lda, dictionary, mds='tsne')
panel
We have made use of pyLDAvis for plotting the topics, Using it we are showcasing the top 30 most used terms for these topics. Hovering on each word will cause some of the bubbles to enlarge, which reflects the presence of that word in that particular topic.
More importantly, hovering on any topic displays an updated list of words with 2 visually different bars. Red bar indicates the total frequency of that word and blue bar indicates the frequency of that word for that particular topic.
Upon analyzing the topics and the words we manually came up with four topics.
mentions = [name.strip() for name_li in df["mentions"].values for name in (name_li)]
mentions= pd.DataFrame([mentions,[1]*len(mentions)]).T.groupby(0).count().sort_values(by=1,ascending=False)
mentions.drop(["realdonaldtrump"],inplace=True)
mentions.columns = ['mention_count']
mentions['id'] = range(1,mentions.shape[0]+1)
mentions_name = mentions.index.values
mentions_id = mentions['id'].values
mentions_count = mentions['mention_count'].values
mentions_count = mentions_count/np.sum(mentions_count)
mentions_id = [int(x) for x in mentions_id]
Here we are calculating the frequency of a userhandle for being mentioned in Donald Trump's tweets. Based on this frequency, we plot the network graph where the central node is the mentioner and all the mentioned userhandles are connected to this central node.
Moreover, the distance of these nodes is decided on the basis of the frequency. Higher the frequency of being mentioned, closer the node is to the central node.
After analyzing the tweets of the two presidents, we find that Trump is, at first glance, a playboy who loves to say whatever he wants to say, do whatever he intends to do, but also an ambitious man who aspires to make America great again.
However, Obama is like a perfect guy who is self-disciplined, who cares and loves the people around him and always fight for freedom and democracy.
So who can exactly represent real American people?
We think they both can because they are just like the people in America who work very hard for the family, and for the country.
After digging into the tweets they sent, they are not only the shinning presidents on TV but also like brothers who live in the neighborhood. We believe it might be the meaning of the data, who tells you not merely about the truth but the story behind the scene.
As we mentioned before, this project requires strong abilities of proccesing natural language.
Cleaning data is tedious; analyzing and visualizing words are also tough but everything pays off when three python class students, three foreigners, get to know about the two great American presidents in such way. How lucky we are to live at this information age!
Thanks python. Thanks MS. Dahlin! HOPE TO SEE YOU SOON!